Goto

Collaborating Authors

 Baton Rouge



'Uncanny Valley': ICE's Secret Expansion Plans, Palantir Workers' Ethical Concerns, and AI Assistants

WIRED

In this episode of, our hosts dive into WIRED's scoop about a secret Trump administration campaign extending right into your backyard. This week, hosts Brian Barrett, Leah Feiger, and Zoë Schiffer discuss WIRED's big scoop on ICE's startling plans to expand to nearly every state in the US. Plus, a WIRED writer lets the viral AI assistant OpenClaw run his life for a week to give listeners a peek of what AI agents can and can't do. ICE Is Expanding Across the US at Breakneck Speed. Write to us at uncannyvalley@wired.com . You can always listen to this week's podcast through the audio player on this page, but if you want to subscribe for free to get every episode, here's how: If you're on an iPhone or iPad, open the app called Podcasts, or just tap this link . I want to continue a conversation that we started yesterday in Slack after work hours for some of us. And this is about the men's short program-- But very specifically want to pick up on the conversation where Zoë had very strong feelings about the results of men's figure skating. I feel like we need to back up because you and Leah authentically care about the Olympics so much and I think just know more about sports than I do. I deeply have never engaged with sports ever, just as a whole rule, as a category. It doesn't exist in my life. Say the lines, say the lines, Zoë, or I'm going to read them verbatim from slack. Wait, I don't even know what you're talking about. I was merely surprised when I watched because the Americans went, I thought, wow, that guy basically fell over and was clumping around the ice, and then Japan went, and they were sailing around like little swans, and then when the gold medal came, it went to the Americans. I couldn't believe what had happened. No one else seemed outraged. For a little backup for our non-ice skating Olympic fans, I was always referring to Ilia Malinin, who a number of publications and sports experts say might actually be one of the greatest figure skaters of all time.


Mutual Information Collapse Explains Disentanglement Failure in $β$-VAEs

Vu, Minh, Wan, Xiaoliang, Wei, Shuangqing

arXiv.org Machine Learning

The $β$-VAE is a foundational framework for unsupervised disentanglement, using $β$ to regulate the trade-off between latent factorization and reconstruction fidelity. Empirically, however, disentanglement performance exhibits a pervasive non-monotonic trend: benchmarks such as MIG and SAP typically peak at intermediate $β$ and collapse as regularization increases. We demonstrate that this collapse is a fundamental information-theoretic failure, where strong Kullback-Leibler pressure promotes marginal independence at the expense of the latent channel's semantic informativeness. By formalizing this mechanism in a linear-Gaussian setting, we prove that for $β> 1$, stationarity-induced dynamics trigger a spectral contraction of the encoder gain, driving latent-factor mutual information to zero. To resolve this, we introduce the $λβ$-VAE, which decouples regularization pressure from informational collapse via an auxiliary $L_2$ reconstruction penalty $λ$. Extensive experiments on dSprites, Shapes3D, and MPI3D-real confirm that $λ> 0$ stabilizes disentanglement and restores latent informativeness over a significantly broader range of $β$, providing a principled theoretical justification for dual-parameter regularization in variational inference backbones.




Rare, deep-sea encounter: California scientists observe 'extraordinary' seven-arm octopus

Los Angeles Times

Things to Do in L.A. Tap to enable a layout that focuses on the article. Rare, deep-sea encounter: California scientists observe'extraordinary' seven-arm octopus On November 6, 2025, MBARI Senior Scientist Steven Haddock and researchers in MBARI's Biodiversity and Biooptics Team observed a seven-arm octopus (Haliphron atlanticus) during an expedition in Monterey Bay with MBARI's remotely operated vehicle at a depth of approximately 700 meters. This is read by an automated voice. Please report any issues or inconsistencies here . California scientists captured rare footage of a seven-arm octopus eating a jellyfish.


Unsupervised decoding of encoded reasoning using language model interpretability

Fang, Ching, Marks, Samuel

arXiv.org Artificial Intelligence

As large language models become increasingly capable, there is growing concern that they may develop reasoning processes that are encoded or hidden from human oversight. To investigate whether current interpretability techniques can penetrate such encoded reasoning, we construct a controlled testbed by fine-tuning a reasoning model (DeepSeek-R1-Distill-Llama-70B) to perform chain-of-thought reasoning in ROT-13 encryption while maintaining intelligible English outputs. We evaluate mechanistic interpretability methods--in particular, logit lens analysis--on their ability to decode the model's hidden reasoning process using only internal activations. We show that logit lens can effectively translate encoded reasoning, with accuracy peaking in intermediate-to-late layers. Finally, we develop a fully unsupervised decoding pipeline that combines logit lens with automated paraphrasing, achieving substantial accuracy in reconstructing complete reasoning transcripts from internal model representations. These findings suggest that current mechanistic interpretability techniques may be more robust to simple forms of encoded reasoning than previously understood. Our work provides an initial framework for evaluating interpretability methods against models that reason in non-human-readable formats, contributing to the broader challenge of maintaining oversight over increasingly capable AI systems.


LLM & HPC:Benchmarking DeepSeek's Performance in High-Performance Computing Tasks

Nader, Noujoud, Diehl, Patrick, Brandt, Steve, Kaiser, Hartmut

arXiv.org Artificial Intelligence

Large Language Models (LLMs), such as GPT-4 and DeepSeek, have been applied to a wide range of domains in software engineering. However, their potential in the context of High-Performance Computing (HPC) much remains to be explored. This paper evaluates how well DeepSeek, a recent LLM, performs in generating a set of HPC benchmark codes: a conjugate gradient solver, the parallel heat equation, parallel matrix multiplication, DGEMM, and the STREAM triad operation. We analyze DeepSeek's code generation capabilities for traditional HPC languages like Cpp, Fortran, Julia and Python. The evaluation includes testing for code correctness, performance, and scaling across different configurations and matrix sizes. We also provide a detailed comparison between DeepSeek and another widely used tool: GPT-4. Our results demonstrate that while DeepSeek generates functional code for HPC tasks, it lags behind GPT-4, in terms of scalability and execution efficiency of the generated code.


Exploring Spiking Neural Networks for Binary Classification in Multivariate Time Series at the Edge

Ghawaly, James, Nicholson, Andrew, Schuman, Catherine, Diez, Dalton, Young, Aaron, Witherspoon, Brett

arXiv.org Artificial Intelligence

We present a general framework for training spiking neural networks (SNNs) to perform binary classification on multivariate time series, with a focus on step-wise prediction and high precision at low false alarm rates. The approach uses the Evolutionary Optimization of Neuromorphic Systems (EONS) algorithm to evolve sparse, stateful SNNs by jointly optimizing their architectures and parameters. Inputs are encoded into spike trains, and predictions are made by thresholding a single output neuron's spike counts. We also incorporate simple voting ensemble methods to improve performance and robustness. To evaluate the framework, we apply it with application-specific optimizations to the task of detecting low signal-to-noise ratio radioactive sources in gamma-ray spectral data. The resulting SNNs, with as few as 49 neurons and 66 synapses, achieve a 51.8% true positive rate (TPR) at a false alarm rate of 1/hr, outperforming PCA (42.7%) and deep learning (49.8%) baselines. A three-model any-vote ensemble increases TPR to 67.1% at the same false alarm rate. Hardware deployment on the microCaspian neuromorphic platform demonstrates 2mW power consumption and 20.2ms inference latency. We also demonstrate generalizability by applying the same framework, without domain-specific modification, to seizure detection in EEG recordings. An ensemble achieves 95% TPR with a 16% false positive rate, comparable to recent deep learning approaches with significant reduction in parameter count.


FATHOMS-RAG: A Framework for the Assessment of Thinking and Observation in Multimodal Systems that use Retrieval Augmented Generation

Hildebrand, Samuel, Taylor, Curtis, Oesch, Sean, Ghawaly, James M Jr, Sadovnik, Amir, Shivers, Ryan, Schreiber, Brandon, Kurian, Kevin

arXiv.org Artificial Intelligence

Abstract--Retrieval-augmented generation (RAG) has emerged as a promising paradigm for improving factual accuracy in large language models (LLMs). We introduce a benchmark designed to evaluate RAG pipelines as a whole, evaluating a pipeline's ability to ingest, retrieve, and reason about several modalities of information, differentiating it from existing benchmarks that focus on particular aspects such as retrieval. We present (1) a small, human-created dataset of 93 questions designed to evaluate a pipeline's ability to ingest textual data, tables, images, and data spread across these modalities in one or more documents; (2) a phrase-level recall metric for correctness; (3) a nearest-neighbor embedding classifier to identify potential pipeline hallucinations; (4) a comparative evaluation of 2 pipelines built with open-source retrieval mechanisms and 4 closed-source foundation models; and (5) a third-party human evaluation of the alignment of our correctness and hallucination metrics. We find that closed-source pipelines significantly outperform open-source pipelines in both correctness and hallucination metrics, with wider performance gaps in questions relying on multimodal and cross-document information. Human evaluation of our metrics showed average agreement of 4.62 for correctness and 4.53 for hallucination detection on a 1-5 Likert scale (5 indicating "strongly agree"). Research sponsored by the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory, managed by UT -Battelle, LLC, for the U. S. Department of Energy. Notice: This manuscript has been authored by UT -Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.